﻿Tehnici de Ingineria Limbajului Natural Curs 11 The pragmatic level Cognition triggered by affordances, conceptualisations and language formation Curs: Dan Cristea Laboratoare: Diana Trandabăț, Mihaela Onofrei, Daniela Gîfu, Ionuț Pistol Cognition: integration of language, vision and action (with information and images from K Pastra and Y Aloimonos) Spectrum of researches and systems • Go back to AI and robotics – SHRDLU system verbalized visual changes in a 2D blocks scene, – medium translation systems (e g automatic sports commentators), – multimedia presentation systems (e g automatic creation of illustrated technical manuals), – robots controlled through voice POETICON++ (2012-2016) • Advocates that language is necessary for robots to generalise behaviours and perceptions – puts forward a concrete methodology for developing cognitive computational mechanism • There is increasingly growing evidence that language is inherently connected to action and perception – computational and engineering research in cognitive robotics POETICON++ (2012-2016) • Cognitive robotics research: geared towards closing the loop between robot sensing, robot acting and robot learning – language kept as a communication interface with humans – language (through its hierarchical and compositional structure) can play a dynamic role and should be included in the loop • POETICON++ showed why language is significant for the next generations of robots iCub – a robot which listen and acts iCub – the robot trained in the Poeticon++ project Language in correlation with visual abilities, motor capacities and experience about world Associating objects with words • Machine learning methods: the learning module helps in developing resources on its own through supervised learning: – human feeds the robot with a word which corresponds to the image of an object in view; – the robot generates a simplified representation of the image of this object as feature-value vectors (colour and shape attributes); – once a number of associations are learned, the agent compares representations of the image of any new object with the ones it knows; – using e g a nearest neighbour algorithm the system associates the new object with the name of the known object it is most similar to Grounding resources: PRAXICON • A resource that links natural language and sensorimotor representations of concepts, with the aim of facilitating multimodal content integration in cognitive systems – going bottom-up in the resource (from sensorimotor representations to concepts) one will get a hierarchical composition of human behaviour, – going top-down (from concepts to sensorimotor representations) one will get intentionality-laden interpretations of those structures Katerina Pastra (2008): PRAXICON: The Development of a Grounding Resource, Proceedings of the international workshop on human-computer conversation, Bellagio, Italy Visual and language cognition: the team of prof Aloimonos The robot hand performs an action: understanding sequences of images The robot hand performs an action: understanding sequences of images The robot hand performs an action: understanding sequences of images Decomposing actions • Three main ‘morpho-syntactic' features which characterize human actions and can be employed for defining action terminals and non-terminals: – tool complement (tc): the effector of a movement, a body part, a combination of body parts or the extension of a body part with a graspable object used as a tool; – object complement (oc): any object affected by a tool- use action; – goal (g): the final purpose of an action sequence of any length or complexity Decomposing actions K Pastra and Y Aloimonos (2012) The minimalist grammar of action Philosophical Transactions of the Royal Society B: Biological Sciences, 367(1585):103–117 Semantic reasoning based on affordances • Suppose an object is identified as a rake (by vision-based classification) è use knowledge stored in a language-based semantic network (e g WordNet) è it can be chosen as a good candidate for bringing other objects closer • Difficult to express symbolic rules for the huge variety of objects that can be used as tools Learn affordances by examples • Complement the linguistic knowledge with a probabilistic model of affordances learned by the robot through self-exploration – robot performs numerous experiments on a set of objects displaced on a table; – one object is selected for grasping (intermediate object) and another object is selected to be acted on; – no concept of “tool” still exists: objects and actions are randomly selected during the learning process and a causal model of the occurrences is obtained in the form of a Bayesian Network Visual descriptors help to learn affordances • 2D visible silhouette: – segment the object in connected components of pixels (“blobs”) – describe blobs by basic sets of visual features • contour perimeter, area, polygonal approximation, convex hull, approximating ellipse, minimum-enclosing circle and minimum-enclosing rectangle – use these shape primitives to assess similarity between objects even if they belong to different conceptual classes, because they are very general and do not demand for a categorization of the object • e g : toothpicks and straws might be categorically different but, due to their similar shapes, both might afford to stir the coffee Visual descriptors help to learn affordances • Measure the success of different triplets: , in which the tool and the object are feature-based characterized, and the action is localized in an ontology : Get-closer with the rake the ball! è SUCCESS : Stir with the straw the coffee! è SUCCESS : Get-closer with the straw the ball! è FAIL • Associate names to tools, actions and objects • Fill in missing tools or objects : Stir the coffee! è X=straw, toothpick, tea spoon Words help associate affordances • When visual and word stimuli were presented concurrently: – iCub associated a label with the features characterizing an object and the learned affordances – after learning, when only a spoken input was presented for an action and an object, the affordances were remembered and the correct tool was selected Finally, iCub decides by itself how to solve a command Conceptualization in Language An Evolutionist View This course is based on a talk given in COST A31 project meeting, Budapest, May 2008 Some pictures belong to Luc Steels, SONY Laboratories, Paris The ALEAR project (2008-2010) • How concepts aroused in humans and how have they been expressed in language? – Why do we speak the way we do? – Why are there so many languages although most of us operate with the same concepts? – Is there a way to prove scientifically hypotheses about the evolution of languages? The “Talking Heads” experiment • Goal: how the language evolved? – In a community, over time: language emerges through self organisation – In individuals, meaning is built in a cumulative growth process ALEAR – Main Objective “Carefully controlled experiments in which autonomous humanoid robots self-organise rich conceptual frameworks and communication systems with similar features as those found in human languages ” Synthesis of intelligence approaches • knowledge-based or symbolic: operationalizing models from logic, generative linguistics and cognitive psychology • machine learning: copy intelligence by learning • behaviour-based: put the study of AI on biological grounds (see also earlier cybernetics) è neural (including deep): copy the physical realization of the brain, make variations and study the effects Approaches in studying the evolution of language • Whole systems approach: physical embodiment, sensory-motor, perception, conceptualisation, language • Self-generated: as opposed to designed or acquired through inductive machine learning • Multi-agent: as opposed to stand-alone • Evolutionary: start from scratch and see how a communication system forms and further develops Setting • Cognitive agents: – Physical aspects: • body • sensors • articulators • physical location • objects and agents located in the environment – Mental properties • behaviour • memory • lexicon • grammar • etc • The two aspects are separated: a real agent exists only when a virtual agent is loaded in a physical robot body The robots Physically embodied autonomous agents – Motor-sensory processing • perception, • movements, • actions – Conceptual processing • recognise objects, • learn a lexicon, • build representations of concepts • towards the development of grammar Teleporting • Develop categories in one location and enrich his learning experience by moving to another location • The transmission of language from one generation to the next • Intercultural exchange and language contacts by migrating mental bodies in different parts of the world Interactive games • Rules: – agents can interact only conforming to stimuli coming from the external environment – they are bound to maximise their communicative success (programmed) • Acquisitions – guessing games: develop the vocabulary and abstract concepts by saying/guessing/pointing – develop space and time conceptualisations: events – develop rudiments of syntax The protocol of a game - One agent (speaker) chooses an object (focus), conceptualises it and, without pointing to it, emits a string description conforming to his lexicon - The second agent (hearer) parses that lexical description, matches it against his own conceptualisations and points to one object he believes is the focus object - If match, the game is successful è both agents increase their confidence in the mapping conceptual space-lexicon for the words they used - If the game fails è they decrease their confidence and the hearer learns another connection between the real focus and the description string COST A31 – WG1, Budapest, May 2008 Perception and categorisation • Sensory channels – software processes interpreting specific real world information: • HPOS (horizontal position) • VPOS (vertical position) • GRAY (gray level) • others – domain: 0-10 (continuous) discretised to a number of discrete values è categorisation – categorisation could be specific to individuals What is proved? • A lexicon in a single agent – new words are invented or adopted – scores of association between forms (words) and meanings (concepts) go up and down – a “virgin”, “newly born” agent will catch up with a lexicon already existent in a population • A common lexicon that stabilizes in a group of agents – however, differences could still coexist – the lexicon is sensible to the grows or reduction of population – the lexicon can absorb some shocks of contacts with other groups or can be destabilized What the plots show? • Increasing the population size • The agents create a word without knowing that one word already exist somewhere in the population (as it takes time to propagate ) è the risk of synonymy increases • However, a steady progress towards an effective communication system is noticed What the plots show? • What happens when two populations interact? • There is an initial destabilisation period when the coherence is low as the ambiguity increases • However, the new community catches up and a new common lexicon emerges, abundant in synonyms How to prove the origins of language? • In a simplified form: language = lexicon + grammar – The lexicon gives names for concepts and objects: the guessing games – The grammar expresses relations between concepts and objects: how to put it in terms of interactions between agents? Guessing games implicit assumptions • Experiments are made in a controlled setting that simplifies many aspects: – The world is simplified to the content of the table (scene) – The agents have the attention focused towards the scene – Their “aim” is to identify objects (they are motivated) – The agents dispose of a set of channels which are sufficient to put in evidence identifying properties of the objects populating the scene – The maximum vocabulary sufficient to cover the concepts, as values produced by channels, is strictly limited – The words used by the agents apply to property values and not to objects • This setting is assumed (given, programmed) Establishing settings for exercising the birth of a grammar • Setting 1: – The world: the content of the table (scene) – Attention: focused towards the scene – Motivation: identifying objects – Channels: identify properties of objects (enriched) – Maximum vocabulary: strictly limited – The agents already share a background vocabulary for naming properties of objects (as in phase 1), not directly objects – Only one property is not enough for disambiguation è necessity to use combinations of words to express conjunctions – Implicit supposition: combinations express conjunction and not disjunction Conclusions of a set of experiments of this kind • Will not give rise to a grammar of expressing conjunctions: the implicit assumption was that putting words together restricts the selection (in conformity with most modern languages) • The order of words is not important red circular or circular red One step further in building a grammar • Setting 2: – The world: the content of the table (scene) – Attention: focused towards the scene – Motivation for: identifying spatial relations among objects – Channels: identify properties of objects BUT ALSO spatial relations among objects – Maximum vocabulary: strictly limited – The agents have as background a common vocabulary for naming properties of objects (as in phase 1), not directly objects – Implicit supposition: in a linear expression Obj1 R Obj2, the focus is Obj1 But grammar is all about form “HREL[right-of(obj2 )]” expresses the concept “whatever is to the right of obj2“ One way to say that in this relation is obj1: “obj1 lexical-item-for(right-of) obj2” But, the agent knows a term for “right”: “mo” Then, he might combine this term with a new word expressing the concept of “relation”, for instance “ga”: “mo-ga” How to diminish the amount of initial assumptions? • Remember our Setting 2: – Implicit supposition: in a linear expression Obj1 R Obj2, the focus is Obj1 • This is a direct and artificial immixture in the very heart of the birth process of the language! • Solution: parameterize all grammatical features of the language and let them evolve naturally • Word order and prepositions John gave Mary a book in the library • Affixes Das Mädchen gibt den schweren Koffer The Girl gives the heavy suitcase ihres Bruders den Freunden (of) her brother (to) the friend • Inflection Brutus Marcello librum dedit Brutus Marcellus book gave Brutus gave a book to Marcellus [Source: palmer, p 8] • Particles Tanaka-san wa Tokyo de o-to-san ni atta Tanaka TOPIC Tokyo (in) father (loc) meet Tanaka met his father in Tokyo from Luc Steels Examples of semantic roles: Agent: the instigator of the event Counter-agent: the force or resistance against which the action is carried out Object: the entity that moves or changes or whose position or existence is in consideration Result: the entity that comes into existence as a result of the action Instrument: the stimulus or immediate physical cause of the event Source: the place from which something moves Goal: the place to which something moves Experiencer: the entity which receives or accepts or experiences or undergoes the effect of an action Source: Fillmore from Luc Steels Fluid Construction Grammar • Structures to represent the information needed in language processing about a specific sentence (feature structures) • Structures to represent the lexical and grammatical constructions (rules) • Operations of Unify and Merge • Structures specifying how new rules are built (templates) a chemical metaphor from Luc Steels Constructions • Semantic and syntactic categories do not operate in isolation • They are part of frames (schemas, patterns) • Constructions are mappings from semantic to syntactic schema • Constructions can also add additional meaning to meaning of the parts and add additional form Fillmore, Kay, Michaelis, Croft, Goldberg, … from Luc Steels Conclusion • The language is considered from an evolutionistic point of view • Lexicon formation: proved experimentally • Grammar and dialogue: was in the study in ALEAR – Perhaps Fluid Construction Grammar • integrates syntax and semantics (Fillmore's roles) • unification and merge mechanisms